Skip to content

MRG: Autobuild features#198

Merged
bambooforest merged 24 commits intophoible:masterfrom
drammock:autobuild-features
Mar 10, 2019
Merged

MRG: Autobuild features#198
bambooforest merged 24 commits intophoible:masterfrom
drammock:autobuild-features

Conversation

@drammock
Copy link
Copy Markdown
Member

@drammock drammock commented Feb 16, 2019

closes #192
closes #6
closes #7

@drammock drammock mentioned this pull request Feb 19, 2019
@drammock drammock force-pushed the autobuild-features branch from c477be9 to c977f65 Compare March 3, 2019 10:53
@drammock
Copy link
Copy Markdown
Member Author

drammock commented Mar 3, 2019

@bambooforest this is getting very close. I've thrown a few different changes into this PR (which normally I don't like to do, but it made things a lot easier), but the commit messages should give you a fairly clear picture. The # of unique segments with NA features is down to 13; if you run scripts/add-features.R script it'll generate data/glyphs-with-na-feats.csv if you wanna see what remains. I'll try to knock out those last few in the next day or two and push to this PR; I'll ping you when that happens but feel free to have a look now if you want since there's a lot of changes.

@bambooforest
Copy link
Copy Markdown
Contributor

@drammock this is great. And it's only 5 languages.

FYI: I get and error execution halted, although the file data/glyphs-with-na-feats.csv is created:

$ Rscript add-features.R
Warning in FUN(X[[i]], ...) : dʒx is not in special_feats table.
Warning in FUN(X[[i]], ...) : ɡbr is not in special_feats table.
Warning in FUN(X[[i]], ...) : kpr is not in special_feats table.
Warning in FUN(X[[i]], ...) : dʒɾ is not in special_feats table.
Warning in FUN(X[[i]], ...) : ɡbɾ is not in special_feats table.
Warning in FUN(X[[i]], ...) : kpɾ is not in special_feats table.
Warning in FUN(X[[i]], ...) : ntʃɾ is not in special_feats table.
Warning in FUN(X[[i]], ...) : ŋmkpɾ is not in special_feats table.
Warning in FUN(X[[i]], ...) : tʃɾ is not in special_feats table.
Warning in FUN(X[[i]], ...) : ɲdʒ is not in special_feats table.
Warning in FUN(X[[i]], ...) : nɖʐ is not in special_feats table.
Warning in FUN(X[[i]], ...) : tsɦ is not in special_feats table.
Warning in FUN(X[[i]], ...) : tsɦ is not in special_feats table.
Warning in FUN(X[[i]], ...) : tsɦ is not in special_feats table.

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Error:
Execution halted

The list that it creates, are some of these errors that we can clean up in the input? E.g.

tsɦ, t̪s̪ɦ|tsɦ in UPSID, e.g. "voiceless aspirated alveolar sibilant affricate with breathy release"

https://github.com/phoible/dev/blob/master/raw-data/UPSID/UPSID_IPA_correspondences.tsv#L386

ɲd̠ʒ should be n̠d̠ʒ according to our conventions as reported as prenasalized post-alveolar / palatal, eh?

nɖʐ in 2231 Xumi should be ɳɖʐ for the pre-nasalized retroflex, as I understand it.

This is from !Xun d̠ʒxʼ which we don't seem to be able to encode with features for all of click consonants anyway...

ɡbr, kpr in 1424 Morokodo are reported in the source.

d̠ʒɾ, ɡbɾ, kpɾ, ŋmkpɾ, n̠t̠ʃɾ, t̠ʃɾ in 1630 Mbembe are reported in the source.

@drammock drammock force-pushed the autobuild-features branch from c977f65 to 4673a73 Compare March 6, 2019 03:34
@drammock drammock force-pushed the autobuild-features branch from 4673a73 to e746dae Compare March 6, 2019 06:28
@drammock drammock changed the title WIP: Autobuild features MRG: Autobuild features Mar 6, 2019
@drammock drammock requested a review from bambooforest March 6, 2019 06:28
@drammock
Copy link
Copy Markdown
Member Author

drammock commented Mar 6, 2019

@bambooforest sorry for the late commit... realized when I woke up that I forgot to re-run the agg and feat-builder scripts after the rebase. also just found another minor tweak to do.

@drammock drammock removed the request for review from bambooforest March 6, 2019 18:38
@drammock drammock requested a review from bambooforest March 6, 2019 22:41
bambooforest added a commit to bambooforest/phoible-scripts that referenced this pull request Mar 8, 2019
@bambooforest
Copy link
Copy Markdown
Contributor

@drammock i updated the space modifier order here:

08bdd37

should i push directly somewhere else? to your branch? to this PR?

this still doesn't put the length marker after the tones. it wasn't immediately clear to me where these are getting reordered. here?

https://github.com/drammock/phoible/blob/autobuild-features/scripts/aggregation-helper-functions.R#L289

also, should we reorganize the diacritics also according to the conventions? here's one suggestion just going from the top down:

diacritics <- c(
"̴", # velarized/pharyngealized (combining tilde overlay)
"̼", # linguolabial (combining seagull below)
"̪", # dental (combining bridge below)
"̺", # apical (combining inverted bridge below)
"̻", # laminal (combining square below)
"̟", # advanced (combining plus sign below)
"̠", # retracted (combining minus sign below)
"̝", # raised (combining up tack below)
"̞", # lowered (combining down tack below)
"̘", # advanced tongue root (combining left tack below)
"̙", # retracted tongue root (combining right tack below)
"͓", # frictionalized (combining x below)
"̹", # more round (combining right half ring)
"̜", # less round (combining left half ring)
"̮", # derhoticized (combining breve below)
"̰", # creaky (combining tilde below)
"̤", # breathy (combining diaresis below)
"̥", # devoiced (combining ring below)
"̊", # devoiced (combining ring above)
"͇", # non-sibilant (combining equals sign below)
"͈", # fortis (combining double vertical line below)
"͉", # lenis (combining left angle below)
"̬", # stiff (combining caron below)
"̩", # syllabic (combining vertical line below)
"̯", # non-syllabic (combining inverted breve below)
"̃", # nasalized (combining tilde)
"͊", # denasalized (combining not tilde above)
"͋", # nasal emission (combining homothetic)
"̈", # centralized (combining diaresis)
"̽", # mid-centralized (combining x above)
"̆", # short (combining breve)
"̚" # unreleased (combining left angle above)
)

but I noticed that we don't mention these in the order part of the conventions (or at all, including stuff, denasalized, and nasal emission):

"͊", # denasalized (combining not tilde above)
"͋", # nasal emission (combining homothetic)
"̬", # stiff (combining caron below)
"͇", # non-sibilant (combining equals sign below)
"͈", # fortis (combining double vertical line below)
"͉", # lenis (combining left angle below)
"̆", # short (combining breve)

@drammock
Copy link
Copy Markdown
Member Author

drammock commented Mar 8, 2019

I suggest reviewing / merging this one and doing the diacritic reordering in a different PR. This one already addresses too many extra issues that aren't part of its explicit goal of just making the feature building work, and I guess I don't see unpreferred diacritic ordering as a critical, time-sensitive flaw on the same level as missing features.

I should note that the orderIPA function does more than we need it to (i.e., it handles cases where base glyphs, diacritics, modifiers, and tone are all present in a single segment). Given that we rigorously segregate tonemes from phonemes, the section of code you link to is probably not needed, and probably deleting it would give you the result you want (tones before diacritics in toneme segments).

As for the length marker, I'm fine with moving it to the end. I don't think it makes a meaningful difference; I originally put it early in the ordering because typographically I think it looks better. But I disagree with some of the other ordering choices in 08bdd37. Please open a new PR (preferably after merging this one) and we can hash out those decisions there.

bambooforest added a commit to bambooforest/phoible-scripts that referenced this pull request Mar 9, 2019
@bambooforest
Copy link
Copy Markdown
Contributor

@drammock -- I'm ok with leaving the diacritic ordering as is here if you believe the semantics are OK -- e.g. qʷʰ vs now qʰʷ.

I was only trying to make it reflect our conventions (e.g. length marker to the end). We can ship this off for 2.0 as is, I think. (But then we might want to update our conventions site.)

@xrotwang the current CLDF dump reflects this PR

@drammock
Copy link
Copy Markdown
Member Author

drammock commented Mar 9, 2019 via email

@bambooforest
Copy link
Copy Markdown
Contributor

thanks @drammock. i looked at the basics here and doubled-checked the stuff that @xrotwang found. looks good. agree there's some issues to discuss some specifics to look into in detail. here's some cheap checks:

https://github.com/bambooforest/phoible-scripts/blob/master/tests/test-features.md

one thing i don't understand is how you tested against the previous segment-feature vectors (given the the reordering of characters, otherwise i would have included that check here)

@bambooforest bambooforest merged commit 0ec7201 into phoible:master Mar 10, 2019
@drammock drammock deleted the autobuild-features branch March 11, 2019 17:32
@drammock drammock mentioned this pull request Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conventionalize EA segments features: combining plus sign below features: palatal diacritic

2 participants